Competing neural networks: 
Finding a strategy for the game of matching pennies 
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The ability of a deterministic, plastic system to learn to imitate stochastic behavior is analyzed. 
Two neural networks -actually, two perceptrons- are put to play a zero-sum game one against the 
other. The competition, by acting as a kind of mutually supervised learning, drives the networks to 
produce an approximation to the optimal strategy, that is to say, a random signal. 
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I. INTRODUCTION 

Since the connection between disordered spin systems 
and symmetric binary neural networks was drawn EJ in- 
tensive theoretical, numerical and experimental research 
has been devoted to this field within physics, and in the 
boundary of physics with biology and information the- 
ory, among others . From the viewpoint of the study 
of dynamical systems, neural networks are a special kind 
of distributed active systems Q , which in their most im- 
pressive realization -the brain- are able to display ex- 
tremely sophisticated collective behavior. Actual models 
have of course much more modest scopes but, in spite 
of their simplicity, they have been able to imitate some 
basic features of cognitive processes. These models have 
also been extended to perform specific tasks, such as for 
instance process control and forecasting || . 

A basic capability of a wide class of neural-network 
models is that of learning, i.e. the possibility of modi- 
fying the internal architecture of the network to adapt 
its dynamics to an expected response. This process can 
take a variety of forms, to be chosen according to the 
aims of the model. Pattern storing and recognition -the 
so-called associative memory- is perhaps the best known 
PL Another well known instance is learning by gener- 
alization. In this case, the network is exposed to some 
input information and the output is compared with the 
expected response. Errors are usually backpropagated 
to modify the network dynamics through a change in its 
architecture. The network thus learns from experience. 
It is expected that after a certain learning transient the 
network is able to produce the correct output even from 
inputs not included in the learning sample. This kind of 
learning can be carried on under supervision, or the sys- 
tem can be designed to learn in an unsupervised manner, 
by means of a selforganization mechanism ^-^] . 

In this paper, we explore a neural-network model of 
the learning that takes place during a competitive game. 
Competitive games have recently attracted a great deal 
of attention among physicists as simple models of adap- 
tive evolution and selforganization in biological, social, 
and economical systems M. Neural networks have been 



designed and trained to play some highly complex games 
such as chess and backgammon ||. The complexity of 
these games, however, does not allow a systematic anal- 
ysis of the learning process or a statistical evaluation of 
the performance accomplished. On the other hand, too 
simple games -such as those that admit a pure optimal 
strategy ||- should be readily solved by a suitably de- 
signed neural network. In fact, finding a pure strategy 
can be associated with a maximization problem. 

Here, we focus the attention at an intermediate level, 
choosing a competitive zero-sum game with very simple 
rules but lacking a pure optimal strategy, v.g. the game 
of matching pennies. Two neural networks are left to 
repeatedly play the game against each other. The suc- 
cessive game results are used on-line to feed the learning 
mechanism of the two players. As in the case of human 
players, each network tries to guess the strategy of its 
opponent and, thus, competition becomes a kind of mu- 
tual supervision. The optimal strategy for the game of 
matching pennies is a purely stochastic one. Thus, the 
challenge for the networks, whose dynamics is fully de- 
terministic, consists in approximating as close as possible 
a random evolution. Our analysis of the time series gen- 
erated during the game shows that even small networks 
with simple architectures do quite well -probably better 
than any human being (not using a randomizing device) 

In the next section we describe in detail the game of 
matching pennies, and specify the architecture and learn- 
ing dynamics of the competing neural networks. Sec- 
tion III is devoted to the study of the model as a time- 
discrete dynamical system -a mapping- with emphasis 
in its phase-space evolution. In Sect. IV, we analyze 
statistical properties of the dynamics during the game, 
evaluating the performance of the networks within an 
information-theory approach. Finally, we discuss our re- 
sults and consider some possible extensions. 
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II. THE GAME AND THE PLAYERS 



In the game of matching pennies, player I chooses 
among two possible instances, say "heads" or "tails." 
Player II, not knowing player I's choice, also chooses ei- 
ther "heads" or "tails." Then, the two choices are dis- 
closed -for example, each player showing a penny- and, 
if they are the same, player I pays one cent to player 
II. If, on the contrary, the choices have been different, 
II pays one cent to I. The procedure is then repeated a 
large number of rounds, which has for instance been de- 
fined by a previous agreement between the players. In 
a less symmetric but very well-known realization of the 
same game, player II must guess in which hand has player 
I hidden a coin or any other small object. The pay-off 
rules are the same as for the game of matching pennies. 
Since at each round player I's loss (or gain) equals player 
IPs gain (or loss), this is a zero-sum game. In game the- 
ory, a two-player zero-sum game is said to be a "strictly 
competitive" game [||. 

As the game proceeds, we expect the two players trying 
to outguess each other, keeping their own strategies se- 
cret. Due to the high symmetry of the game of matching 
pennies, however, there is no optimal pure strategy for 
either player. Of course, it would be a most poor strat- 
egy for any player to choose the same instance at every 
time step. But, moreover, any deterministic way of de- 
ciding which instance should be chosen at a given time 
step could be disclosed by the opponent in the long run. 
On the other hand, trying to guess the opponent's strat- 
egy could lead an unsolvable, infinitely involved problem. 
As illustrated in , we may picture player I as thinking: 
"People usually choose heads; hence II will expect me to 
choose heads and choose heads himself, and so I should 
choose tails. But perhaps II is reasoning along the same 
line: he'll expect me to choose tails, and so I'd better 
choose heads. But perhaps that is IPs reasoning, so..." 
In this way, it becomes impossible to determine a strat- 
egy in which either player could be confident. It follows 
that it is necessary for both players to introduce a mixed 
stochastic strategy where, at each time step, each player 
chooses an instance at random, with a certain proba- 
bility distribution. The symmetry of the present game 
indicates clearly that the best strategy for both players 
is to choose heads or tails with equal probability. In the 
long run, this insures a zero average gain, whereas any 
other strategy implies a net gain for the opponent. 

Our aim here is to study, as a dynamical system, a pair 
of competing neural networks playing the game of match- 
ing pennies. In particular, we are interested at analyz- 
ing whether the dynamics implies learning of an efficient 
strategy "on-line," i.e. as the game proceeds. Since the 
network dynamics and the learning algorithm considered 
in the following are deterministic, it cannot be expected 
that the networks will find the optimal (stochastic) strat- 
egy. However, it could be possible that the networks were 
able to approximate it by means of a complex determin- 




FIG. 1. Two competing perceptrons. 



istic dynamics over a sufficiently long period. The basic 
idea in the learning process is that the playing strategy 
of each network should emerge from trying to guess the 
opponent's strategy. This is in fact the mechanism ex- 
pected to drive the game between human players: though 
a general analysis of the game shows that the best way 
of playing is at random, each player tries to outguess the 
other assuming a deterministic strategy, at least, in the 
short term. The way of playing derives therefore from a 
(somewhat paradoxical) cooperative mechanism during 
the contest, where each player "supervises" the learning 
of the other. 

As for the architecture of each neural network, we take 
the simplest model, namely, the perceptron, introduced 
in pUJl^ l and reviewed in standard books on neural net- 
works (sec, for example, &&[l3|). It consists of a collec- 
tion of N inputs Si(t) and of N synaptic weights Wi(t) 
(i = 1, . . . , N) which define, at each time step, a single 
output a(t) as 



a(t) = S 



(1) 



Here S is a step-shaped function, that we choose to be 
S(x) = sign(x). Thus, <r = ±1. We associate each of this 
two possible values of the output with the instance chosen 
by the network at a given time step, say, a(t) = +1 for 
heads and a(t) = —1 for tails. 

We consider now two of these perceptrons (see Fig. [j]), 
both with N inputs. At each time step, the output of one 
of the perceptrons should be determined by the outputs 
of the other at the precedent steps. Indeed, this is the 
information available to each player on the strategy of 
the opponent. We associate therefore the inputs sj of 
perceptron I with the previous outputs a 2 of perceptron 
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II and vice versa, as 
1.2 



(*) 



-2,1 



(*-*). 



(2) 



i = 1, . . . , N. Time steps are of unitary length. 

Learning is a consequence of the comparison of the 
outputs of the two perceptrons at each time. If the out- 
puts are identical perception II wins, and the synaptic 
weights w\ of perceptron I are modified to produce a bet- 
ter prediction of the opponent's output at the next round. 
Meanwhile, the synaptic weights wf of perceptron II can 
be left invariant, as they have led this perceptron to win 
the round. If, on the other hand, the outputs have been 
different, wf are modified and wj are maintained. A 
suitable algorithm for implementing this mechanism is 
the standard perceptron learning rule |p|3|JT^|, which in 
our case implies 

w]{t + 1) - w\{t) - rye^W^WkH^W (3) 

and 

w 2 t (t + 1) = w?(t) + ^[-^(ijo^t)]^*)*?®, (4) 

with i = 1,...,N and rj = (1 + N)' 1 . The Heaviside 
function -where G(x) = 1 for x > and 9(x) =0 for 
x < 0- acts here as a mask, by selecting the perceptron 
whose synaptic weights are to be modified. 

Suppose that the successive outputs of perceptron I are 
replaced by a periodic series of ±1 [O. From the view- 
point of perceptron II, this is interpreted as the oppo- 
nent's choice of a trivial strategy. As a matter of fact, the 
perceptron convergence theorem ||,[ll],|l5| insures that if 
N is large enough, i.e. if perceptron II's memory is suf- 
ficiently long-ranged, the learning procedure stops and, 
from then on, perceptron II wins all rounds. When the 
period of the output series of perceptron I is lower than 
N, in fact, it can be straightforwardly shown that there is 
at least one set of synaptic weights w 2 that make percep- 
tron II able to win at every round. The number of steps 
needed to compute these synaptic weights is of order N 3 
fLU , and can be tested numerically in our system. It is 
therefore not expected that when two large perceptrons 
are left to play freely one of them will adopt a short- 
period strategy. 



III. THE SYSTEM AS A MAPPING: 
PHASE-SPACE DYNAMICS 



The phase space corresponding to this mapping is dis- 
crete. In fact, the inputs s\' 2 can adopt the two values 
±1 only. Moreover, w\' 2 can have real values but they 
vary on a discrete set, since according to Eqs. (0) and 
(||) the variation of the synaptic weights has always the 
same modulus, |Au> 1,2 | = r\. Once the initial synaptic 
weights have been fixed, the discrete set of their possible 
future values is completely determined. 

During the evolution, the synaptic weights can in prin- 
ciple run over an infinite set. However, though the synap- 
tic weights are not expected to converge to fixed values 
but to continuously evolve as the game proceeds, it is 
reasonable to conjecture that they will not perform ar- 
bitrarily long excursions in phase space. To prove this 
conjecture, let us consider in detail the evolution of the 
synaptic weights, given by the two last equations in (|^) 
or, equivalently, by Eqs. (||) and (Q). These two equa- 
tions can be written, respectively, as 



w]{t + l)=w}{t)- m \t)s\{t) iia\t) 
wj(t + l)=wj(t) ifaHt) 



a 2 (t) 



(6) 



and 



wUt+l) = wUt) if^(t) 



-«7 2 (i) 



(7) 



We now select one of the perceptrons and restrict the 
dynamics of its synaptic weights to the time steps where 
they are effectively modified, by simply ignoring the steps 
where no changes occur. The evolution equations can be 
written in vectorial form as 



w(t + 1) = w(t) - r}S[w(t) ■ s(i)]s(t), 



(8) 



where the components of w and s are the synaptic 
weights and the inputs of the selected perceptron, re- 
spectively. We recall that S(x) is the sign function. The 
scalar product w • s is defined in the usual way, cf. Eq. 
(0). Note that (||) holds for both perceptrons. 

Let us now consider for a moment that, in Eq. (||), the 
vector s is independent of time. Under this assumption it 
is possible to reduce the system (||) to two equations for 
the quantities p(t) = s ■ w(i) and q(t) = |w(t)| 2 , namely, 



Equations (|lj) to (Q) define the dynamics of our system. 
They can be resumed in a 4A-dimensional recursive map- 
ping for the perceptron inputs and the synaptic weights 
only. The recursion equations are 



1.2 



(*• 
w}{t 



°i 

Si' (t 



i) = s\z 



(*)], 

(i = 2,...,N), 
xs{(t+l)s}(t) (i = l,.. 
xsf(t + l) S 2 (t) (i = l,.. 



1)] 
,N), 

H)] 
,N). 



(5) 



P (t + 1) =p(t)-(l- v )S[ P (t)} 

q(t + l) =q(t)-2r]\p(t)\+r)(l-r)). 



(9) 



It can be easily seen from the first equation that p(t) 
converges, after a certain transient, to a period-2 cy- 
cle. The two values of p on this cycle, p% and p2, satisfy 
the relation pi = p± — 1 + T]. They depend on the ini- 
tial conditions, but are always restricted to the intervals 
< p\ < 1 — r\ and r\ — 1 < P2 < 0. Accordingly, q(t) 
oscillates between two values, q\ and qi, defined by the 
initial conditions and related by q2 = qi—2r/pi+r](l — ri). 



3 



0.9 




i i i i i i i i i 

100 200 300 400 500 

time 



FIG. 2. Module of the vector of synaptic weights for two 
competing perceptrons with N = 10 inputs, as a function of 
time. Full and dotted curves correspond to Iw 1 ! and |w 2 |, 
respectively. Full and dashed horizontal lines correspond to 
the analytical approximation for the average value of |w| for 
N — 10 and TV — > oo, respectively. 

After the transient, the modulus of the vector w is there- 
fore restricted to vary within the interval [— W, W] with 
W = max {^T, 

In summary, for fixed s the evolution given by Eq. (J8J) 
drives the synaptic weights towards a bounded domain 
whose size depends on the initial condition but which is 
always finite. We stress that this is valid for any choice 
of s. Coming now back to the case of variable inputs, 
we note that the number of possible values for s(t) is 
also finite, and equals 2 N . Equation (|J) can therefore 
be thought of as the application, at each time step, of 
one of the 2^ transformations just studied. Since each 
of them contracts the space of synaptic weights towards 
a bounded region, after the transient w(i) will always 
evolve within the union of all those regions. Disregarding 
transient effects, the space of synaptic weights is then 
finite. Hence, the accessible phase space of mapping (||) 
is finite and discrete. 

As an illustration of the evolution of synaptic weights, 
we show in Fig. || the time dependence of |w| for both 
perceptrons. The initial weights were uniformly chosen at 
random in (—0.2,0.2), and N = 10. The horizontal lines 
in the plot stand for the theoretical values of the temporal 
average of |w| for N — 10 (full line) and N — ► oo (dashed 
line). These can be calculated by taking the square of Eq. 
(|), namely, 

q(t + 1) = q(t) - 2r ?v ^(I)|w(t) ■ s(t)| + 77(1 - r,) (10) 

with w = w/|w|. This recursion equation is analogous 
to the second of Eqs. (^|). It can be seen that, for suffi- 
ciently large N, the average of |w(<) • s(t)\ over time -or, 
equivalently, over random realizations of the vectors w 




FIG. 3. Average periods of the orbits of mapping (|5|), 
for different values of TV. Notice the logarithmic scale in the 
vertical axis. 

and s- becomes in depe ndent of N and approaches the 
limit (|w • s|) = y/2/n ps 0.798. From Eq. @, this 
implies that for large N: 

(|w|> = (V«) = ^|« 0.627, (11) 

cf. [pl|. A better approx imation for finite N is (|w|) = 
^ttnIn- 1)/8(N+ l) 2 . For N = 10 this gives (|w|) w 
0.540, which is the value plotted in Fig. g. The average 
value of |w| provides an estimate for the size of the do- 
main of phase space where the synaptic weights evolve 
after transients have elapsed. Note that the fact that 
(|w|) approaches a constant for large N implies that, in 
average, the synaptic weights are w-' 2 ~ 1/y/N. 

The main byproduct of the fact that for our system 
phase space is finite and discrete is that, after the tran- 
sient has elapsed, the orbits will be periodic. It becomes 
therefore relevant to determine the length of the periods. 
In fact, if it resulted that orbits get typically trapped in 
short cycles, the problem would at once get uninterest- 
ing. We have measured the periods numerically, carrying 
out extensive series of 100 to 1000 realizations 6.4 x 10 5 
steps long, with N ranging from 2 to 10. Initial condi- 
tions were chosen at random, with the synaptic weights 
uniformly distributed in (—0.2,0.2). The system has al- 
ways been found to reach a periodic orbit for N < 7. For 
a fixed value of N, periods show typically a broad distri- 
bution. The average period has been found to increase 
exponentially with N, as shown in Fig. |[ For N > 7, 
not all the realizations displayed periodicity, indicating 
the occurrence of periods longer than our numerical real- 
izations. Some test realizations for N = 10 suggest that 
periods could grow beyond 10 s steps. 

The system thus seems to have two well-differentiated 
time scales. On the one hand, there should be a time 
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scale associated with learning, of order iV 3 . As stated 
above, in the case of a single perceptron being trained 
to predict a periodic series this is in fact the number of 
steps needed to compute all the synaptic weights. For the 
competing perceptrons, the length of the initial transient 
during which the system explores phase space to find the 
bounded region where it will evolve later, should be of 
the same order. On the other hand, we have a much 
longer "recursion" time scale, of order A N (A w 5.05, 
Fig. H|), associated with the periods of orbits inside that 
region. Though the two-perceptron dynamics is dissi- 
pative, it resembles in this aspect that of Hamiltonian 
systems with many degrees of freedom. Indeed, accord- 
ing to Poincare's theorem |l7]] , Hamiltonian systems are 
recurrent and, at sufficiently long times, they visit an ar- 
bitrarily small neighborhood of their initial state. How- 
ever, in a statistical description of their evolution, it is 
possible to identify much shorter time scales, related to 
the relaxation of fast variables 18 1. 

At the level of recursion time scales, the dynamics of 
the two-perceptron system is in a sense trivial. Orbits are 
in fact periodic at long times, and the results of succes- 
sive game rounds will be repeated ad infinitum. When, 
during a whole period, one of the perceptrons is able 
to gain even the smallest advantage over the other, this 
small difference will continuously accumulate producing, 
in the long run, an arbitrarily large bias in the result 
of the game. As in the case of large Hamiltonian sys- 
tems, however, recursion times are far beyond the reach 
of our (numerical) experience as the size of the percep- 
trons increases. Therefore, most of the realizations of 
the two-perceptron game analyzed bellow will always be 
restricted to the transient period, previous to the appear- 
ance of periodicity. In this stage, the relevant time scale 
is the learning time, of order N 3 . Within such times we 
expect the system to reach a kind of stationary playing 
regime where, if the learning algorithm is efficient, the 
outputs of the two perceptrons should imitate a random 
series of ±1. In the next section we study the statistical 
properties of these output series. 



IV. STATISTICAL ANALYSIS OF THE GAME 
DYNAMICS 

Random properties in time series can be characterized 
in a variety of ways. In our case, where the relevant 
series are arrays of ±1, a suitable measure of time cor- 
relations is an information- like quantity [fl9|| . As shown 
below, this quantity can be used to characterize the cor- 
relation between different series and, consequently, the 
correlation of a series with itself. It has the advantage of 
being additive, and is therefore appropriate when com- 
paring numerical results. We thus begin by defining the 
mutual information of two time series. 

Consider two dichotomic stochastic processes Si and 
S 2 that, at each time step, can adopt the values ±1 



with certain probability distributions. Let P(Si,S 2 ) be 
the joint probability for the processes, and Pi (Si) = 
J2s 2 P(Si,S 2 ) and P 2 (S 2 ) = £ 5l P(Si,S 2 ) their indi- 
vidual (marginal ^0|) probabilities. A measure of the 
correlation between the two processes is given by the mu- 
tual information |1S||, defined as 



/ = 



]T ]T p(Si,s 2 )io g2 



Si=±l S 2 =±l 



P(5i,S 2 ) 
Pi(Si)P 2 (S 2 ) 



(12) 



It can be shown that / > 0. For two uncorrelated 
processes, where P(Si,S 2 ) = Pi(Si)P 2 (S 2 ), the mu- 
tual information reaches its minimum, 1 = 0. The 
maximal value of the mutual information is obtained 
for two identical stochastic processes, Si = S 2 , where 
/= -Pi(+l)log 2 Pi(+l)-Pi(-l)log 2 Pi(-l). In par- 
ticular, if Pi(+1) = Pi(-l) = 1/2, we get 1 = 1. 

The definition of mutual information, Eq. (12), sug- 



gests immediately a way of introducing a measure of au- 
tocorrelation for a single dichotomic stochastic process 
S at different times. In fact, associating Si(t) and S 2 (t) 
with S(t) and S(t + r), respectively, we can introduce the 
(two-time) autoinformation as 

J(*,T) = Es( t )Es( t +T) P[s(t),s(t + T )] 

vino- { M$BS±±i)L \ 

X 10 S2 \p[S(t)}P[S(t+r)}j ■ 



(13) 

If S is a stationary stochastic process |^0| the autoinfor- 
mation depends on the time interval r only, / = I(t). 
If the successive values of S are uncorrelated we have 
1 = 0, whereas for r = we get the maximal value 
I(t, 0) = -P(+l) log 2 P(+l) - P(-l) log 2 P(-l). 

In practice, for a finite realization of the stochastic 
processes, the probabilities involved in Eqs. ([l2] ) and 
( |l3| ) are approximated by the corresponding frequencies, 
which can be computed by simple counting of the rel- 
evant occurrences. This approximation implies that in 
the case of uncorrelated processes the information can 
differ from zero, due to fluctuations in the finite sample 
under consideration. It can be shown that for a T-step 
realization of uncorrelated stochastic processes where the 
individual probabilities of the two possible values ±1 are 
equal, P(+l) = P(-l) = 1/2, the probability distribu- 
tion for the information to have a value / is 



Pt(I) 



Tln2 



exp(-T/ln2), 



(14) 



for small /. The resulting mean value of the information 



(I) = / I PT (I)dI 
Jo 



1 



2T In 2 ' 



(15) 



which decreases as T 1 as the series size grows. For large 
T, pt{P) — * ${P), as expected. Thus, the distribution of 
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FIG. 4. Average autoinformation for the output of one of 
the competing perceptrons as a function of the time interval 
t, measured in series of 10 4 steps. Both perceptrons have 
10 inputs. The horizontal line corresponds to the average 
autoinformation expected for an uncorrelated series of the 
same length. 

values for the information computed from finite samples 
of size T and its average are to be respectively compared 
with pr(I) and (I) in order to detect the presence of 
correlations. 

We now consider two playing perceptrons with N = 10, 
and apply the definition of autoinformation to any of 
the two series of outputs, S(t) = a 1,2 (t). The outputs are 
recorded after the first 10 4 steps have elapsed, in order 
to avoid nonstationary transient effects during the first 
stage of learning (of order N 3 [^3|). The recorded series 
are T = 10 4 steps long, and the results presented below 
correspond to averages over 5 x 10 4 realizations. 

Figure ^ shows the measured average autoinformation 
as a function of r. The horizontal line corresponds to 
the average autoinformation ( |l5| ) expected for an uncor- 
related series with the present value of T, i.e. (I) w 
7.2 x 10~ 5 . We first note that, except for r = 2 and 4, the 
autoinformation of the output signal is always less that 
twice the value of (I) for a random series. This implies 
that each perceptron exhibits a quite good performance 
in generating a random sequence. There are however cer- 
tain regular patterns that suggest the presence of small 
but nontrivial correlations. Indeed, the average autoin- 
formation oscillates strongly for small r, reaching high 
levels for even values of r and dropping abruptly for odd 
values of r. On average, these oscillations decrease as r 
grows, but they reappear near r = 20 and 30. Realiza- 
tions for other values of N indicate that the oscillation 
amplitude decreases as N grows, and that the "bursts" 
at which oscillations reappear occur when r approaches 
integer multiples of N. The amplitude of these bursts 
decreases for larger multiples. 



FIG. 5. Normalized frequencies of autoinformation values 
obtained from 5 x 10 4 series of 10 4 steps, for several values of 
the time interval r. Both perceptrons have 10 inputs. Curves 
correspond to the distribution expected for uncorrelated series 
of the same length, Eq. (p^). 

A more detailed description of the appearance of cor- 
relations in the output signals of the perceptrons is pro- 
vided by the distribution of autoinformation values. Fig- 
ure [5] displays the normalized frequencies of autoinforma- 
tion values resulting from our sets of 5 x 10 4 realizations 
of 10 4 -step series for various values of r. The curve corre- 
sponds to pt {I) for an uncorrelated series, Eq. @. For 
t = 1 practically no correlations are detected by the au- 
toinformation. We note only a slight overpopulation for 
large I. On the other hand, for r = 2, which corresponds 
to the largest deviation in the average autoinformation 
(see Fig. [|), the distribution is qualitatively different. It 
exhibits a maximum at a rather large value of the autoin- 
formation (J « 2 x 10 -3 ) and, except for small values of 
I, it is systematically much larger than the distribution 
expected for a random series. At r = 10, the distribu- 
tion has a profile similar to that observed for r = 1, but 
the overpopulation at the tail is noticeably larger. This 
overpopulation grows further during the bursts where os- 
cillations reappear. The plot for r = 20 shows the dis- 
tribution at the first of these bursts. In contrast, for 
the intermediate values at which the average autoinfor- 
mation plotted in Fig. |] reaches the information of a 
random series, the corresponding distribution cannot be 
distinguished from py(7). 

We have found that the oscillations of the average au- 
toinformation shown in Fig. ^| are essentially a byprod- 
uct of the internal dynamics of each perceptron. In fact, 
if instead of using the opponent's output, a perceptron 
is fed with a random series of ±1, the autoinformation 
of its own output oscillates as well. A detailed analysis 
of the output series reveals that, for even r, the product 
a(t)a(t+T) is more frequently negative than positive. For 



G 



instance, for r = 2, the respective frequencies are about 
0.52 and 0.48. We remark in passing that this small rel- 
ative difference -of the order of a few percent- produces 
an increment larger than one order of magnitude in the 
autoinformation, which evidences the sensibility of this 
quantity as a measure of correlations. For larger values 
of t, the difference is even smaller. On the other hand, 
for odd r no differences are detected. 

In order to trace the origin of the correlations observed 
for even t, a careful analysis of the learning algorithm has 
to be carried out. We consider first the case of t = 2. 
After two time steps, the vector of synaptic weights can 
be written as 



w(t + 2)= w(t) - r)6(t)<r(t)s(t) 

-r)6(t+l)<j(t+l)s(t+l), 



(16) 



where 9(t) = 1 if the weights have been modified at time 
t, and 9{t) = otherwise [cf. Eq. (g)]. When the per- 
ceptron if fed with a random signal, 9(t) can be seen as 
a stochastic process with equal probabilities for its two 
values. The product of the outputs two steps apart is 

a{t)a{t + 2) = a(t)S[w(t + 2) -s(t + 2)] 

= %(i)w(i) • s[t + 2) - T]0(t)s(t) ■ s(t + 2) 
-r)0{t + l)a{t)a(t + l)s(t + 1) ■ s(t + 2)] 

(17) 

Numerical measurements of the right-hand side (r.h.s.) 
of this equation show that the first two terms in the ar- 
gument of the sign function have zero mean and do not 
produce a net contribution to the sign of a(t)a(t + 2). 
The only contribution to the correlation is originated in 
the third term. To verify this fact analytically, we first 
note that 



w(t 
s(t-( 



= w(i)[l + 0(l/VW)], 
s(t + 2) = s(i) • s(t+ 1)[1 



0(1/VN)] 



(18) 



The first of these identities results from the fact that, as 
shown in the previous section, io, ~ 1/s/N whereas, ac- 
cording to Eq. (^) , its variation in one time step is given 
by i] ~ 1/N. The second identity can be readily proven 
from the evolution of Si(t), also given in Eq. (|J). Conse- 
quently, neglecting terms of order l/y/N, the sign of the 
product a(t)a(t+ l)s(i+l)-s(t+2) can be approximated 
as follows: 



S[cr(t)cr(£ + l)s(t + 1) ■ s(* + 2)] 

« S{[w(t) ■ s(<)][w(t) • s(t + l)][s(t) • s(t + 1)]}. 



(19) 



Note that the argument of the sign function in the r.h.s. 
of this equation is likely to be positive, since it is given 
by the product of the projections of two vectors, s(t) and 
s(t + 1), along the direction of w(t) times their mutual 
scalar product. More explicitly, 



with s w = w • s and s' = s — s w w. The first term in 
the r.h.s. of this equation is always positive, whereas 
the second term is not expected to have a definite sign 
on average. Note moreover that the first term is of the 
order of unity, whereas the second term is of order >/~N. 
This implies that the relative importance of the positive 
contribution decreases as N grows. Coming now back 
to Eq. ([I?]) through Eq(|l9|) it is clear that, when the 
synaptic weights are modified at time t+l (i.e. 9(t+l) = 

1) , there is in average a negative contribution to a(t)a(t+ 

2) , in agreement with numerical results. According to 
the above analysis, this correlation should become less 
important as N grows. In fact, the autoinformation peak 
at t = 2 is observed to decrease in the simulations. 

For arbitrary r, the analysis can be repeated mutatis 
mutandis. We have 

a(t)a(t + t) = S[ cr(t)w(t) ■ s(t + t) 

-vE T t rJ 9(t + t')a(t)a(t + t') (21) 

Xs(t + t') -S(t + T)}. 

Taking now into account that 



[w(*)-B(t)][w(t)-8(* + l)][B(t).s(i + l)] 

= sl(t)sl(t + 1) + s w (t)s w (t + l)s'(t) • s'(t + 1), 



(20) 



w(t + t') = w(t)[l + 0(^/FjN)], 

s(t + t') ■ s(t + T )= s(t) ■ s(t + r - t')[l + 0{y/F/N)], 

(22) 

the sign of the product a(t)a(t + t')s(t + t') ■ s(t + r) in 
the sum of Eq. ( pl| ) can be approximately written as 

S[a(t)a(t + t')s{t + t') ■ s(t + t)] 

w S{[w(t) ■ s(t)][w(t) • s(t + t')][s(t) ■ s(t + t - t')]}. 

(23) 

The argument of the sign function in the r.h.s. of this 
equation has a positive contribution of the same type as 
in Eq. (JToJ) when t + t' = t + T -t', i.e. for r = 2t' . 
Therefore, in the realizations where 6(t + r/2) = 1 a neg- 
ative contribution to a(t)a(t + r) appears. This of course 
requires r to be even. Since other contributions have no 
definite sign, peaks in the average autoinformation are 
expected for even val ues o f r, as observed. Note more- 
over that the order y/t'/N of the terms neglected in Eq. 
( p3| ) increases with t', i.e. with r. This explains why the 
height of the peaks decreases as r grows. 

Along the same line of analysis, it is possible to ex- 
plain the bursts where the autoinformation peaks reap- 
pear. Now, however, it is necessary to take into account 
both perceptrons. In fact, the output of a single per- 
ceptron fed with a random signal does not exhibit such 
bursts. They are rather a consequence of the interaction 
between the two perceptrons during the game. The anal- 
ysis, whose details we omit here, shows that bursts are 
originated by a kind of bouncing effect in the transmis- 
sion of information between the opponents. This bounc- 
ing effects is attenuated as r grows, and decreases for 
larger perceptrons, as observed in the numerical simula- 
tions. 
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In summary, the statistical analysis of perceptron out- 
puts at time scales larger than the learning stage but 
much shorter than the recursion times, reveals that the 
perceptrons are quite efficient players of the game of 
matching pennies. Even with a relatively small num- 
ber of inputs, i.e. with a relatively short-ranged memory, 
their dynamics is able to generate quasi-random mixed 
strategies. We recall that this behavior originates spon- 
taneously from the deterministic learning algorithm with 
which each player is endowed to outguess its opponent. 
Remaining correlations, which could in principle be ex- 
ploited by a "smarter" opponent to obtain a net gain 
during the game, are overall small and can in fact be 
reduced systematically by increasing the memory range. 

V. DISCUSSION 

We have here considered an example of a fully de- 
terministic learning system and explored its ability to 
behave stochastically. Concretely, we have coupled two 
deterministic perceptrons in such a way that they imi- 
tate two players of the game of matching pennies, trying 
to outguess each other. Since the optimal strategy for 
this game is a purely stochastic sequence of outputs, the 
learning process should lead the network dynamics to ap- 
proach a random signal. 

In the first place, we have observed that a perceptron 
producing a periodic signal can always be defeated by 
a sufficiently "smart" opponent, i.e. by a perceptron 
with a sufficiently large number of neurons. This kind 
of "dummy" player provides in fact a linearly separable 
set of examples for the learning of its opponent || . The 
learning task is thus to find a plane in the input space 
that separates the input states into two groups, namely 
those whose expected outputs are either +1 or — f. On 
the other hand, when the two competing perceptrons are 
allowed to learn the situation is pretty much different. 
Since both networks are looking for the best performance, 
they both change their strategies on line and, thus, they 
may well provide not only a nonlinearly separable set of 
examples, but also an inconsistent one. That is to say, 
at two different times any perceptron can give two dif- 
ferent outputs from the same input state. This is the 
reason why the learning process does, in fact, not con- 
verge, and why the system is expected to spontaneously 
develop stochastic-like dynamics. 

Our main conclusion is that, despite the fact that the 
overall dynamics is in the long run periodic, the percep- 
trons do learn to behave quasi-stochastically over mod- 
erately long time intervals. An information-theoretical 
statistical analysis of the output signals shows slight time 
correlations, to be ascribed to the deterministic coupling 
between the learning mechanism and the outputs them- 
selves, which act as the inputs of the respective oppo- 
nents. The effect of these correlations is observed to de- 
crease gradually as the number of neurons in each per- 



ceptron grows. Two seemingly paradoxical aspects of 
this learning process deserve to be pointed out, because 
of their suggestive similarity with learning in humans (or 
other animals) entrained in a systematic activity such 
as a repetitive competition game. In the first place, the 
mutual search for regularity in the opponent's behavior 
leads the whole system to develop highly irregular evolu- 
tion over long times, which can hardly be distinguished 
from purely random dynamics. In the second place, we 
stress that competition can here be interpreted as a form 
of mutually supervised learning and, thus, results in a 
kind of collaboration between the opponents. 

Some natural extensions of the present model are 
worth considering for future work. An important ques- 
tion to be addressed regards the case where the entangled 
perceptrons are not equal in size, i.e. they have differ- 
ent numbers of neurons. In such a situation, in fact, the 
above quoted correspondence of competition and collab- 
oration could fail to hold. Preliminary results along this 
line (not presented in this paper) suggest however that 
the advantage of a larger perceptron is relatively small. 
Only very small networks (N ~ 2) are systematically 
defeated by larger opponents, as they typically fall in 
short-period cyclic orbits. 

The perceptron-like structure of our networks is prob- 
ably the simplest instance among a large class of possible 
architectures. Fully connected networks and multilayer 
structures have been shown to exhibit very high perfor- 
mance in learning tasks |2p||i1| . It would therefore be 
interesting to study how these more complex networks 
respond to mutually supervised learning. Finally, from 
the viewpoint of game theory, it would be relevant to 
analyze the dynamics of competing networks engaged in 
other games, especially, when ordinary optimization pro- 
cedures do not lead to the optimal playing strategy. We 
mention, in particular, the iterated prisoner's dilemma 
PH , which is attracting a great deal of attention as 
a paradigm of competition-collaboration interplay, and 
multiplayer minority games, recently studied by means 
of ensembles of globally coupled perceptrons [Q . Com- 
peting neural networks could contribute to a better un- 
derstanding of the complex learning mechanisms involved 
in such kind of social interactions. 
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